Part 3: Modelling with R

Author

Verónica Andreo

Published

April 2, 2023

In this third part of the studio, we’ll use R to model Aedes albopictus distribution in Northern Italy. For that, we need to connect to GRASS via the rgrass package by Bivand (2022) in order to read occurrence data and predictors.

rgrass

  • initGRASS(): starts a GRASS GIS session from R
  • execGRASS(): executes GRASS GIS commands
  • gmeta(): shows GRASS location metadata
  • read_VECT() and read_RAST(): read vector and raster maps from GRASS into terra objects.
  • write_VECT() and write_RAST(): write terra objects into GRASS GIS database
Note

Package terra is developed by Hijmans (2022) and will eventually replace raster.

GRASS GIS and R can be used together in two ways:

A. Using R within a GRASS GIS session, i.e. starting R (or RStudio) from GRASS terminal

  • type R or rstudio & in the GRASS GIS terminal
  • load rgrass library
  • use read_VECT(), read_RAST() to read data from GRASS into R
  • access GRASS GIS modules and database through execGRASS()
  • write data (back) to GRASS database with write_VECT() and write_RAST()

B. Using GRASS GIS within an R session, i.e. we connect to GRASS GIS database from within R (or RStudio).

  • we need to start GRASS GIS with initGRASS() from R
  • we access GRASS GIS modules through execGRASS()
  • use read_VECT(), read_RAST(), write_VECT() and write_RAST() to read data from and to GRASS database
Note

Originally intended to apply GRASS functions on data outside GRASS database; hence some prefer to create throw away locations

Let’s move to R

Load packages needed

library(rgrass)
library(sf)
library(terra)
library(raster)
library(mapview)
library(biomod2)
library(dismo)
library(usdm)
library(SDMtune)
library(zeallot)

Option B: Close GRASS, open Rstudio and run:

# path to GRASS binaries (run `grass --config path`)
grassbin <- "/usr/lib64/grass82"
# path to GRASS database
grassdata <- "/home/veroandreo/grass_ncsu_2023/grassdata/"
# path to location
location <- "eu_laea"
# path to mapset
mapset <- "italy_LST_daily"

# start GRASS GIS from R
initGRASS(gisBase = grassbin, 
          home = tempdir(), 
          gisDbase = grassdata, 
          location = location, 
          mapset = mapset, 
          override = TRUE,
          remove_GISRC= TRUE)

Read vector data

# Read vector layers
presence <- st_as_sf(read_VECT("aedes_albopictus"))
background <- st_as_sf(read_VECT("background_points"))

Read raster data

# List rasters by pattern
worldclim <- execGRASS("g.list", 
                       parameters = list(type = "raster", 
                                         pattern = "worldclim*"))
avg <- execGRASS("g.list", 
                 parameters = list(type = "raster", 
                                   pattern = "avg*"))
median <- execGRASS("g.list", 
                    parameters = list(type = "raster", 
                                      pattern = "median*", 
                                      exclude = "*[1-5]"))

# Concatenate map lists
to_import <- c(attributes(worldclim)$resOut, 
               attributes(avg)$resOut, 
               attributes(median)$resOut)

# Read raster layers
predictors <- list()
for (i in to_import){ 
  predictors[i] <- raster(read_RAST(i)) }

# Stack rasters
predictors_r <- raster::stack(predictors)
# Quick visualization in mapview
mapview(predictors_r[['worldclim_bio01']]) + presence

Data preparation

# Variables for models
sp <- "Aedes albopictus"
presence <- st_coordinates(presence)
background <- st_coordinates(background)
env <- predictors_r

# Prepare data: SWD
data_sp <- prepareSWD(species = sp, 
                      p = presence, 
                      a = background, 
                      env = env)

data_sp
## Object of class SWD 
## 
## Species: Aedes albopictus 
## Presence locations: 83 
## Absence locations: 1000 
## 
## Variables:
## ---------
## Continuous: worldclim_bio01 worldclim_bio02 worldclim_bio03 worldclim_bio04 worldclim_bio05 worldclim_bio06 worldclim_bio07 worldclim_bio10 worldclim_bio11 avg_autumnal_cooling avg_count_tmean_higher20_lower30 avg_spring_warming median_lower_m2_consec_days 
## Categorical: NA

Define relevant variables

seed=49
perc_test = 0.2
k = 4
method="Maxent"
cor_th=0.7
perm=10

Create train and test datasets

# Create training and test sets
c(train_sp, test_sp) %<-% 
  trainValTest(data_sp, 
               test = perc_test,
               only_presence = TRUE, 
               seed = seed)
train_sp
## Object of class SWD 
## 
## Species: Aedes albopictus 
## Presence locations: 66 
## Absence locations: 1000 
## 
## Variables:
## ---------
## Continuous: worldclim_bio01 worldclim_bio02 worldclim_bio03 worldclim_bio04 worldclim_bio05 worldclim_bio06 worldclim_bio07 worldclim_bio10 worldclim_bio11 avg_autumnal_cooling avg_count_tmean_higher20_lower30 avg_spring_warming median_lower_m2_consec_days 
## Categorical: NA
test_sp
## Object of class SWD 
## 
## Species: Aedes albopictus 
## Presence locations: 17 
## Absence locations: 1000 
## 
## Variables:
## ---------
## Continuous: worldclim_bio01 worldclim_bio02 worldclim_bio03 worldclim_bio04 worldclim_bio05 worldclim_bio06 worldclim_bio07 worldclim_bio10 worldclim_bio11 avg_autumnal_cooling avg_count_tmean_higher20_lower30 avg_spring_warming median_lower_m2_consec_days 
## Categorical: NA

Create folds for cross-validation

# Create folds 
ran_folds <- randomFolds(train_sp, 
                         k = k,
                         only_presence = TRUE, 
                         seed = seed)

Train a default Maxent model with CV

# Train a full model
full_model_sp <- train(method = method,
                       data = train_sp, 
                       folds = ran_folds)

full_model_sp
## Object of class SDMmodelCV 
## Method: Maxent 
## 
## Species: Aedes albopictus 
## Replicates: 4 
## Presence locations: 66 
## Absence locations: 1000 
## 
## Model configurations:
## --------------------
## fc: lqph
## reg: 1
## iter: 500
## 
## Variables:
## ---------
## Continuous: worldclim_bio01 worldclim_bio02 worldclim_bio03 worldclim_bio04 worldclim_bio05 worldclim_bio06 worldclim_bio07 worldclim_bio10 worldclim_bio11 avg_autumnal_cooling avg_count_tmean_higher20_lower30 avg_spring_warming median_lower_m2_consec_days 
## Categorical: NA
pred_full_model <- predict(full_model_sp,
                           data = env,
                           type = "cloglog")

mapview(pred_full_model)

Variable selection: remove highly correlated variables

# Prepare background locations to test correlation
bg_sp <- prepareSWD(species = sp, 
                    a = background,
                    env = env)

# Remove variables with correlation higher than 0.7 
# while accounting for the AUC
vs_sp <- varSel(full_model_sp,
                metric = "auc", 
                bg4cor = bg_sp, 
                cor_th = cor_th,
                permut = perm)
vs_sp@data
## Object of class SWD 
## 
## Species: Aedes albopictus 
## Presence locations: 66 
## Absence locations: 1000 
## 
## Variables:
## ---------
## Continuous: worldclim_bio04 worldclim_bio07 avg_count_tmean_higher20_lower30 avg_spring_warming 
## Categorical: NA

Remove less important variables

# remove less important variables only if auc does not decrease
reduc_var_sp <- reduceVar(vs_sp,
                          th = 10, 
                          metric = "auc", 
                          test = TRUE, 
                          permut = perm, 
                          use_jk = TRUE)

reduc_var_sp
## Object of class SDMmodelCV 
## Method: Maxent 
## 
## Species: Aedes albopictus 
## Replicates: 4 
## Presence locations: 66 
## Absence locations: 1000 
## 
## Model configurations:
## --------------------
## fc: lqph
## reg: 1
## iter: 500
## 
## Variables:
## ---------
## Continuous: worldclim_bio04 worldclim_bio07 avg_count_tmean_higher20_lower30 
## Categorical: NA

We need now to recreate SWD only with the selected variables to run the final model and make predictions.

# Get only relevant variables from the reduced model
retained_varnames <- names(reduc_var_sp@models[[1]]@data@data)

# Subset raster stack
env <- subset(env, retained_varnames)

# SWD with the selected vars
subset_train_sp <- prepareSWD(species = sp, 
                              p = presence,
                              a = background,
                              env = env)

c(train_sp, test_sp) %<-% 
  trainValTest(subset_train_sp, 
               test = perc_test, 
               only_presence = TRUE, 
               seed = seed)

Run the best models with the full train data

final_model_sp <- train(method = method, 
                        data = train_sp,
                        fc = reduc_var_sp@models[[1]]@model@fc,
                        reg = reduc_var_sp@models[[1]]@model@reg)

Make predictions with the final model

map_sp_maxent <- predict(final_model_sp,
                         data = env, 
                         type = "cloglog")

mapview(map_sp_maxent)

Write raster of final model predictions to GRASS

write_RAST(rast(map_sp_maxent), 
           "Aedes_albopictus_maxent", 
           flags = c("o","overwrite"))
## WARNING: Raster map <Aedes_albopictus_maxent> already exists and will be
##          overwritten
## Over-riding projection check
## Importing raster map <Aedes_albopictus_maxent>...
##    0%   3%   6%   9%  12%  15%  18%  21%  24%  27%  30%  33%  36%  39%  42%  45%  48%  51%  54%  57%  60%  63%  66%  69%  72%  75%  78%  81%  84%  87%  90%  93%  96%  99% 100%
execGRASS("g.list", parameters = list(type="raster",
                                      pattern="Aedes*"))
## Aedes_albopictus_maxent

Model evaluation

# AUC
auc_maxent <- auc(final_model_sp, test = test_sp)
auc_maxent
## [1] 0.8666471
# Threshold dependent evaluation
th_maxent <- thresholds(final_model_sp, 
                        type = "cloglog", 
                        test = test_sp)

knitr::kable(th_maxent, format = 'html', digits = 2)
Threshold Cloglog value Fractional predicted area Training omission rate Test omission rate P-values
Minimum training presence 0.04 0.95 0.00 0 0
Equal training sensitivity and specificity 0.50 0.28 0.27 0 0
Maximum training sensitivity plus specificity 0.51 0.24 0.30 0 0
Equal test sensitivity and specificity 0.53 0.18 0.47 0 0
Maximum test sensitivity plus specificity 0.53 0.17 0.47 0 0

Variable importance in final model

vi_model_sp <- maxentVarImp(final_model_sp)
vi_model_sp
##                           Variable Percent_contribution Permutation_importance
## 1 avg_count_tmean_higher20_lower30              73.4133                69.7786
## 2                  worldclim_bio07              21.2599                12.7664
## 3                  worldclim_bio04               5.3268                17.4550
plotVarImp(vi_model_sp)

Response curves in final model

my_rp <- function(i){
  plotResponse(reduc_var_sp, i)
}

plotlist <- lapply(retained_varnames, my_rp)

labels <- LETTERS[1:length(retained_varnames)]

ggpubr::ggarrange(plotlist = plotlist, labels = labels)

# close the mapset
unlink_.gislock()

Disclaimer

This is only a toy example and only the beginning…

  • other models to test
  • hyper-parameter tuning
  • variable selection and model selection
  • ensemble modeling
  • model validation with independent data
  • uncertainty: where we can predict with confidence
  • many other relevant packages:

References